Geometric-topological Based Arabic Character Recognition, a New Approach
نویسندگان
چکیده
Optical Character Recognition (OCR) is a very old and of great interest in pattern recognition field. In this paper, a new algorithm based on morphological structure is proposed for Arabic character recognition. Our proposed method uses center of mass calculation. It is invariant with the size, translation and rotation of the target image. In addition, topology-based landmarks like intersection pixels masking the intersection of loops and multiple strokes, as well as end points have been used to compute centers of mass of these points located in the individual quadrants of the circles enclosing the characters. After doing initial pre-processing operations like binarization, resizing, normalization, removing noise, skeletonization, the total number of intersection pixels as well as the total number of end points are determined and stored. The character image is then encircled and divided into four quadrants. The center of mass of the character image as well as the masses of each of its four quadrants are determined and the Euclidean distances (ED) of the intersection and end points in each of the quadrants with the massed are calculated. These quantities are determined for both the target and prototype image and then the best match is achieved with the character having the minimum ED. Results show that the presented method opens up a new direction for dealing with the complex problems of OCR.
منابع مشابه
A Graph-Based Segmentation and Feature-Extraction Framework for Arabic Text Recognition
This paper presents a graph-based framework for the segmentation of Arabic text. The same framework is used to extract font independent structural features from the text that are used in the recognition. The major contribution of this paper is a new graph-based structural segmentation approach based on the topological relation between the baseline and the line adjacency graph (LAG) representati...
متن کاملLexicon Reduction for Urdu/Arabic Script Based Character Recognition: A Multilingual OCR
Arabic script character recognition is challenging task due to complexity of the script and huge number of ligatures. We present a method for the development of multilingual Arabic script OCR (Optical Character Recognition) and lexicon reduction for Arabic Script and its derivative languages. The objective of the proposed method is to overcome the large dataset Urdu and similar scripts by using...
متن کاملAbstract: This paper describes a method of Automatic Arabic Character Recognition (ACR)
This paper describes a method of Automatic Arabic Character Recognition (ACR). This system is articulated around four distinct modules which are the main image processing given by: a module of treatment, a module of segmentation, a module of recognition and a module of detection of symbols of classification. For classification; we have used Fuzzy Logic (FL), Genetic Algorithm (GA), and Expert S...
متن کاملA Survey of Robust hybrid approach for Arabic character recognition
In this paper we present a system of Arabic characters recognition dedicated to the automatic reading of ACR (Arabic Character Recognition). The developed system is a Fuzzy classifier: Fuzzy Logic (FL) combined with the Expert System (ES) to extract the topological and the contextual informations of each Print character. This combination is very useful to improve the powerful of Hybrid Intellig...
متن کاملA Proposed Hybrid Technique for Recognizing Arabic Characters
Optical character recognition systems improve human-machine interaction and are urgently required for many governmental and commercial departments. A considerable progress in the recognition techniques of Latin and Chinese characters has been achieved. By contrast, Arabic Optical Character Recognition (AOCR) is still lagging although the interest and research in this area is becoming more inten...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017